3d Lip-tracking for Audio-visual Speech Recognition in Real Applications
نویسندگان
چکیده
In this paper, we present a solution to the problem of tracking 3D information about the shape of lips from 2D picture of a speaker. We focus on lip-tracking of audio-visual speech recordings from the Czech in-vehicle audio-visual speech corpus (CIVAVC). The corpus consists of 4 h 40 min records of audiovisual speech of driver recorded in a car during driving in an usual traffic. In real conditions a head of a speaker (a car driver) can move and turn in various directions. To cope with these movements and to avoid recognition errors caused by changing 3D position of lips, our algorithm utilizes a 3D-model-based approach to the lip-tracking process. First, we present a method for creating and clustering the lip shape models. We derived 20 basic 3D models of the lip shape from real utterances of the speaker for representation of all lip movements. Next, we describe an algorithm for finding the shape of the lips in a picture using image processing. Further we present application of a distance function for choosing the best model for representation of the lip shape obtained by image processing. Finally we depict the results of the lip-tracking and its using for lip-reading.
منابع مشابه
Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications
Developments in dynamic contour tracking permit sparse representation of the outlines of moving contours. Given the increasing computing power of general-purpose workstations it is now possible to track human faces and parts of faces in real-time without special hardware. This paper describes a real-time lip tracker that uses a Kalman lter based dynamic contour to track the outline of the lips....
متن کامل3d Lip Tracking and Co-inertia Analysis for Improved Robustness of Audio-video Automatic Speech Recognition
Multimodality is a key issue in robust humancomputer interaction. The joint use of audio and video speech variables has been shown to improve the performance of automatic speech recognition (ASR) systems. However, robust methods in particular for the real-time extraction of video speech features are still an open research area. This paper addresses the robustness issue of audio-video (AV) ASR s...
متن کاملVisual Human Face Tracking and its Application to Lip-Reading and Emotion Recognition
We describe a visual face tracking algorithm which tracks the global as well as the nonrigid face movement in three dimensions using one camera. It is based on 3D face models. Using the face movement constraints, it is possible to track robustly and in real time. The results of the face tracking have been used in a number of applications including lip reading and emotion recogntion. In both cas...
متن کامل3d Lip Tracking and Co-inertia Analysis for Improved Robustness of Audio-video Automatic Speech Recognition
Multimodality is a key issue in robust humancomputer interaction. The joint use of audio and video speech variables has been shown to improve the performance of automatic speech recognition (ASR) systems. However, robust methods in particular for the real-time extraction of video speech features are still an open research area. This paper addresses the robustness issue of audio-video (AV) ASR s...
متن کاملReal-time lip-tracking for lipreading
This paper presents a new approach to lip tracking for lipreading. Instead of only tracking features on lips, we propose to track lips along with other facial features such as pupils and nostril. In the new approach, the face is rst located in an image using a stochastic skin-color model, the eyes, lip-corners and nostrils are then located and tracked inside the facial region. The new approach ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004